Skip to main content

Multi Modal as Learning to Rank

Colbertv2 instead of CLIP

multiple modalities retrieval

masking

incorporate prompting

focal loss instead of siglip sigmoid